Propagation of Outliers in Multivariate Data
نویسندگان
چکیده
We investigate the performance of robust estimates of multivariate location under nonstandard data contamination models such as componentwise outliers (i.e., contamination in each variable is independent from the other variables). This model brings up a possible new source of statistical error that we call “propagation of outliers.” This source of error is unusual in the sense that it is generated by the data processing itself and takes place after the data has been collected. We define and derive the influence function of robust multivariate location estimates under flexible contamination models and use it to investigate the effect of propagation of outliers. Furthermore, we show that standard high-breakdown affine equivariant estimators propagate outliers and therefore show poor breakdown behavior under componentwise contamination when the dimension d is high.
منابع مشابه
Identification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملLocal multivariate outliers as geochemical anomaly halos indicators, a case study: Hamich area, Southern Khorasan, Iran
Anomaly recognition has always been a prominent subject in preliminary geochemical explorations. Among the regional geochemical data processing, there are a range of statistical and data mining techniques as well as different mapping methods, which serve as presentations of the outputs. The outlier’s values are of interest in the investigations where data are gathered under controlled condition...
متن کاملApplication of robust multivariate control chart with Winsorized Mean: a case study
Water pH and active ingredient concentration are two of the most important variables to consider in the manufacturing process of fungicides. If these variables do not meet the required standards, the quality of the product may be compromised and lead to poor fungicide performance when water is used as the application carrier, which is in most cases. Given the correlation between the variable...
متن کاملComparison of artificial neural network and multivariate regression methods in prediction of soil cation exchange capacity (Case study: Ziaran region)
Investigation of soil properties like Cation Exchange Capacity (CEC) plays important roles in study of environmental reaserches as the spatial and temporal variability of this property have been led to development of indirect methods in estimation of this soil characteristic. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data...
متن کاملThe Grand Tour as a Method for Detecting Multivariate Outliers
A method of viewing multivariate data vectors and identifying among them outliers is described. The method is applied to two sets of benchmark data: Brownee's stack-loss data and Hawkins{Bradu{Kass data. All the outliers contained in these data are easily identiied. Generally, it is expected that the method will yield good results (i.e. will nd the outliers) for data having elliptical or nearly...
متن کامل